16 research outputs found
Analysis of group evolution prediction in complex networks
In the world, in which acceptance and the identification with social
communities are highly desired, the ability to predict evolution of groups over
time appears to be a vital but very complex research problem. Therefore, we
propose a new, adaptable, generic and mutli-stage method for Group Evolution
Prediction (GEP) in complex networks, that facilitates reasoning about the
future states of the recently discovered groups. The precise GEP modularity
enabled us to carry out extensive and versatile empirical studies on many
real-world complex / social networks to analyze the impact of numerous setups
and parameters like time window type and size, group detection method,
evolution chain length, prediction models, etc. Additionally, many new
predictive features reflecting the group state at a given time have been
identified and tested. Some other research problems like enriching learning
evolution chains with external data have been analyzed as well
Towards equilibrium molecular conformation generation with GFlowNets
Sampling diverse, thermodynamically feasible molecular conformations plays a
crucial role in predicting properties of a molecule. In this paper we propose
to use GFlowNet for sampling conformations of small molecules from the
Boltzmann distribution, as determined by the molecule's energy. The proposed
approach can be used in combination with energy estimation methods of different
fidelity and discovers a diverse set of low-energy conformations for highly
flexible drug-like molecules. We demonstrate that GFlowNet can reproduce
molecular potential energy surfaces by sampling proportionally to the Boltzmann
distribution
Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets
Recently, pre-trained foundation models have enabled significant advancements
in multiple fields. In molecular machine learning, however, where datasets are
often hand-curated, and hence typically small, the lack of datasets with
labeled features, and codebases to manage those datasets, has hindered the
development of foundation models. In this work, we present seven novel datasets
categorized by size into three distinct categories: ToyMix, LargeMix and
UltraLarge. These datasets push the boundaries in both the scale and the
diversity of supervised labels for molecular learning. They cover nearly 100
million molecules and over 3000 sparsely defined tasks, totaling more than 13
billion individual labels of both quantum and biological nature. In comparison,
our datasets contain 300 times more data points than the widely used OGB-LSC
PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In
addition, to support the development of foundational models based on our
proposed datasets, we present the Graphium graph machine learning library which
simplifies the process of building and training molecular machine learning
models for multi-task and multi-level molecular datasets. Finally, we present a
range of baseline results as a starting point of multi-task and multi-level
training on these datasets. Empirically, we observe that performance on
low-resource biological datasets show improvement by also training on large
amounts of quantum data. This indicates that there may be potential in
multi-task and multi-level training of a foundation model and fine-tuning it to
resource-constrained downstream tasks
CCR: A combined cleaning and resampling algorithm for imbalanced data classification
Imbalanced data classification is one of the most widespread challenges in contemporary pattern recognition. Varying levels of imbalance may be observed in most real datasets, affecting the performance of classification algorithms. Particularly, high levels of imbalance make serious difficulties, often requiring the use of specially designed methods. In such cases the most important issue is often to properly detect minority examples, but at the same time the performance on the majority class cannot be neglected. In this paper we describe a novel resampling technique focused on proper detection of minority examples in a two-class imbalanced data task. The proposed method combines cleaning the decision border around minority objects with guided synthetic oversampling. Results of the conducted experimental study indicate that the proposed algorithm usually outperforms the conventional oversampling approaches, especially when the detection of minority examples is considered
Using Training Curriculum with Deep Reinforcement Learning. On the Importance of Starting Small
Algorytmy uczenia się przez wzmacnianie są wykorzystywane do rozwiązywania problemów o stale rosnącym poziomie złożoności. W wyniku tego proces uczenia zyskuje na złożoności i wy-maga większej mocy obliczeniowej. Wykorzystanie uczenia z przeniesieniem wiedzy może czę-ściowo ograniczyć ten problem. W artykule wprowadzamy oryginalne środowisko testowe i eks-perymentalnie oceniamy wpływ wykorzystania programów uczenia na głęboką odmianę metody Q-learning.Reinforcement learning algorithms are being used to solve problems with ever-increasing level of complexity. As a consequence, training process becomes harder and more computationally demanding. Using transfer learning can partially elevate this issue by taking advantage of previ-ously acquired knowledge. In this paper we propose a novel test environment and experimentally evaluate impact of using curriculum with deep Q-learning algorithm
Impact of Low Resolution on Image Recognition with Deep Neural Networks: An Experimental Study
Due to the advances made in recent years, methods based on deep neural networks have been able to achieve a state-of-the-art performance in various computer vision problems. In some tasks, such as image recognition, neural-based approaches have even been able to surpass human performance. However, the benchmarks on which neural networks achieve these impressive results usually consist of fairly high quality data. On the other hand, in practical applications we are often faced with images of low quality, affected by factors such as low resolution, presence of noise or a small dynamic range. It is unclear how resilient deep neural networks are to the presence of such factors. In this paper we experimentally evaluate the impact of low resolution on the classification accuracy of several notable neural architectures of recent years. Furthermore, we examine the possibility of improving neural networks’ performance in the task of low resolution image recognition by applying super-resolution prior to classification. The results of our experiments indicate that contemporary neural architectures remain significantly affected by low image resolution. By applying super-resolution prior to classification we were able to alleviate this issue to a large extent as long as the resolution of the images did not decrease too severely. However, in the case of very low resolution images the classification accuracy remained considerably affected
A Multiresolution Grid Structure Applied to Seafloor Shape Modeling
This paper proposes a method of creating a multiresolution depth grid containing bathymetric data describing a stretch of sea floor. The included literature review presents current solutions in the area of the creation of digital terrain models (DTMs) focusing on methods employing regular grids, with a discussion on the strong and weak points of such an approach. As a basis for the investigations, some important recommendations from the International Hydrographic Organization are provided and are related to the accuracy of created models. The authors propose a novel method of storing DTM data, involving multiresolution depth grids. The paper presents the characteristics of this method, numerical algorithms of a conversion between a regular grid and the multiresolution one, and experiments on typical seafloor surfaces. The results are discussed, focusing on the data reduction rate and the variable resolution of the grid structure. The proposed method can be applied in Geographical Information Systems, especially for the purposes of solving sea survey problems